Explainable Boosting Machine#
See the reference paper for full details [1].
Summary#
Explainable Boosting Machine (EBM) is a tree-based, cyclic gradient boosting Generalized Additive Model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining completely interpretable. Although EBMs are often slower to train than other modern algorithms, EBMs are extremely compact and fast at prediction time.
How it Works#
As part of the framework, InterpretML also includes a new interpretability algorithm – the Explainable Boosting Machine (EBM). EBM is a glassbox model, designed to have accuracy comparable to state-of-the-art machine learning methods like Random Forest and Boosted Trees, while being highly intelligibile and explainable. EBM is a generalized additive model (GAM) of the form:
where \(g\) is the link function that adapts the GAM to different settings such as regression or classification.
EBM has a few major improvements over traditional GAMs [2]. First, EBM learns each feature function \(f_j\) using modern machine learning techniques such as bagging and gradient boosting. The boosting procedure is carefully restricted to train on one feature at a time in round-robin fashion using a very low learning rate so that feature order does not matter. It round-robin cycles through features to mitigate the effects of co-linearity and to learn the best feature function \(f_j\) for each feature to show how each feature contributes to the model’s prediction for the problem. Second, EBM can automatically detect and include pairwise interaction terms of the form:
which further increases accuracy while maintaining intelligibility. EBM is a fast implementation of the GA2M algorithm [1], written in C++ and Python. The implementation is parallelizable, and takes advantage of joblib to provide multi-core and multi-machine parallelization. The algorithmic details for the training procedure, selection of pairwise interaction terms, and case studies can be found in [1, 3, 4].
EBMs are highly intelligible because the contribution of each feature to a final prediction can be visualized and understood by plotting \(f_j\). Because EBM is an additive model, each feature contributes to predictions in a modular way that makes it easy to reason about the contribution of each feature to the prediction.
To make individual predictions, each function \(f_j\) acts as a lookup table per feature, and returns a term contribution. These term contributions are simply added up, and passed through the link function \(g\) to compute the final prediction. Because of the modularity (additivity), term contributions can be sorted and visualized to show which features had the most impact on any individual prediction.
To keep the individual terms additive, EBM pays an additional training cost, making it somewhat slower than similar methods. However, because making predictions involves simple additions and lookups inside of the feature functions \(f_j\), EBMs are one of the fastest models to execute at prediction time. EBM’s light memory usage and fast predict times makes it particularly attractive for model deployment in production.
If you find video as a better medium for learning the algorithm, you can find a conceptual overview of the algorithm below:

Code Example#
The following code will train an EBM classifier for the adult income dataset. The visualizations provided will be for both global and local explanations.
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from interpret.glassbox import ExplainableBoostingClassifier
from interpret import show
df = pd.read_csv(
"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
header=None)
df.columns = [
"Age", "WorkClass", "fnlwgt", "Education", "EducationNum",
"MaritalStatus", "Occupation", "Relationship", "Race", "Gender",
"CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"
]
seed = 42
np.random.seed(seed)
df = df.sample(frac=0.05, random_state=seed)
train_cols = df.columns[0:-1]
label = df.columns[-1]
X = df[train_cols]
y = df[label]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=seed)
ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)
ebm_global = ebm.explain_global()
show(ebm_global)
ebm_local = ebm.explain_local(X_test[:5], y_test[:5])
show(ebm_local, 0)
Further Resources#
Bibliography#
[1] Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 623–631. 2013.
[2] Trevor Hastie and Robert Tibshirani. Generalized additive models: some applications. Journal of the American Statistical Association, 82(398):371–386, 1987.
[3] Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 150–158. 2012.
[4] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 1721–1730. 2015.
API#
ExplainableBoostingClassifier#
- class interpret.glassbox.ExplainableBoostingClassifier(feature_names=None, feature_types=None, max_bins=256, max_interaction_bins=32, binning='quantile', mains='all', interactions=10, outer_bags=8, inner_bags=0, learning_rate=0.01, validation_size=0.15, early_stopping_rounds=50, early_stopping_tolerance=0.0001, max_rounds=5000, min_samples_leaf=2, max_leaves=3, n_jobs=- 2, random_state=42)#
Explainable Boosting Classifier. The arguments will change in a future release, watch the changelog.
- Parameters:
feature_names – List of feature names.
feature_types – List of feature types.
max_bins – Max number of bins per feature for pre-processing stage.
max_interaction_bins – Max number of bins per feature for pre-processing stage on interaction terms. Only used if interactions is non-zero.
binning – Method to bin values for pre-processing. Choose “uniform”, “quantile”, or “rounded_quantile”. ‘rounded_quantile’ will round to as few decimals as possible while preserving the same bins as ‘quantile’.
mains – Features to be trained on in main effects stage. Either “all” or a list of feature indexes.
interactions – Interactions to be trained on. Either a list of lists of feature indices, or an integer for number of automatically detected interactions. Interactions are forcefully set to 0 for multiclass problems.
outer_bags – Number of outer bags.
inner_bags – Number of inner bags.
learning_rate – Learning rate for boosting.
validation_size – Validation set size for boosting.
early_stopping_rounds – Number of rounds of no improvement to trigger early stopping.
early_stopping_tolerance – Tolerance that dictates the smallest delta required to be considered an improvement.
max_rounds – Number of rounds for boosting.
min_samples_leaf – Minimum number of cases for tree splits used in boosting.
max_leaves – Maximum leaf nodes used in boosting.
n_jobs – Number of jobs to run in parallel.
random_state – Random state.
- decision_function(X)#
Predict scores from model before calling the link function.
- Parameters:
X – Numpy array for samples.
- Returns:
The sum of the additive term contributions.
- explain_global(name=None)#
Provides global explanation for model.
- Parameters:
name – User-defined explanation name.
- Returns:
An explanation object, visualizing feature-value pairs as horizontal bar chart.
- explain_local(X, y=None, name=None)#
Provides local explanations for provided samples.
- Parameters:
X – Numpy array for X to explain.
y – Numpy vector for y to explain.
name – User-defined explanation name.
- Returns:
An explanation object, visualizing feature-value pairs for each sample as horizontal bar charts.
- explainer_type = 'model'#
Public facing EBM classifier.
- fit(X, y, sample_weight=None)#
Fits model to provided samples.
- Parameters:
X – Numpy array for training samples.
y – Numpy array as training labels.
sample_weight – Optional array of weights per sample. Should be same length as X and y.
- Returns:
Itself.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- predict(X)#
Predicts on provided samples.
- Parameters:
X – Numpy array for samples.
- Returns:
Predicted class label per sample.
- predict_and_contrib(X, output='probabilities')#
Predicts on provided samples, returning predictions and explanations for each sample.
- Parameters:
X – Numpy array for samples.
output – Prediction type to output (i.e. one of ‘probabilities’, ‘labels’, ‘logits’)
- Returns:
Predictions and local explanations for each sample.
- predict_proba(X)#
Probability estimates on provided samples.
- Parameters:
X – Numpy array for samples.
- Returns:
Probability estimate of sample for each class.
- score(X, y, sample_weight=None)#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – Mean accuracy of
self.predict(X)w.r.t. y.- Return type:
float
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- term_importances(importance_type='avg_weight')#
Provides the term importances
- Parameters:
importance_type – the type of term importance requested (‘avg_weight’, ‘min_max’)
- Returns:
An array term importances with one importance per additive term
ExplainableBoostingRegressor#
- class interpret.glassbox.ExplainableBoostingRegressor(feature_names=None, feature_types=None, max_bins=256, max_interaction_bins=32, binning='quantile', mains='all', interactions=10, outer_bags=8, inner_bags=0, learning_rate=0.01, validation_size=0.15, early_stopping_rounds=50, early_stopping_tolerance=0.0001, max_rounds=5000, min_samples_leaf=2, max_leaves=3, n_jobs=- 2, random_state=42)#
Explainable Boosting Regressor. The arguments will change in a future release, watch the changelog.
- Parameters:
feature_names – List of feature names.
feature_types – List of feature types.
max_bins – Max number of bins per feature for pre-processing stage on main effects.
max_interaction_bins – Max number of bins per feature for pre-processing stage on interaction terms. Only used if interactions is non-zero.
binning – Method to bin values for pre-processing. Choose “uniform”, “quantile”, or “rounded_quantile”. ‘rounded_quantile’ will round to as few decimals as possible while preserving the same bins as ‘quantile’.
mains – Features to be trained on in main effects stage. Either “all” or a list of feature indexes.
interactions – Interactions to be trained on. Either a list of lists of feature indices, or an integer for number of automatically detected interactions.
outer_bags – Number of outer bags.
inner_bags – Number of inner bags.
learning_rate – Learning rate for boosting.
validation_size – Validation set size for boosting.
early_stopping_rounds – Number of rounds of no improvement to trigger early stopping.
early_stopping_tolerance – Tolerance that dictates the smallest delta required to be considered an improvement.
max_rounds – Number of rounds for boosting.
min_samples_leaf – Minimum number of cases for tree splits used in boosting.
max_leaves – Maximum leaf nodes used in boosting.
n_jobs – Number of jobs to run in parallel.
random_state – Random state.
- decision_function(X)#
Predict scores from model before calling the link function.
- Parameters:
X – Numpy array for samples.
- Returns:
The sum of the additive term contributions.
- explain_global(name=None)#
Provides global explanation for model.
- Parameters:
name – User-defined explanation name.
- Returns:
An explanation object, visualizing feature-value pairs as horizontal bar chart.
- explain_local(X, y=None, name=None)#
Provides local explanations for provided samples.
- Parameters:
X – Numpy array for X to explain.
y – Numpy vector for y to explain.
name – User-defined explanation name.
- Returns:
An explanation object, visualizing feature-value pairs for each sample as horizontal bar charts.
- explainer_type = 'model'#
Public facing EBM regressor.
- fit(X, y, sample_weight=None)#
Fits model to provided samples.
- Parameters:
X – Numpy array for training samples.
y – Numpy array as training labels.
sample_weight – Optional array of weights per sample. Should be same length as X and y.
- Returns:
Itself.
- get_params(deep=True)#
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- predict(X)#
Predicts on provided samples.
- Parameters:
X – Numpy array for samples.
- Returns:
Predicted class label per sample.
- predict_and_contrib(X)#
Predicts on provided samples, returning predictions and explanations for each sample.
- Parameters:
X – Numpy array for samples.
- Returns:
Predictions and local explanations for each sample.
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)w.r.t. y.- Return type:
float
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score(). This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- term_importances(importance_type='avg_weight')#
Provides the term importances
- Parameters:
importance_type – the type of term importance requested (‘avg_weight’, ‘min_max’)
- Returns:
An array term importances with one importance per additive term